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d , A recent paper (Edman et al. [5]) has taken a combinatorial approach to measuring 

the anonymity of a threshold mix anonymous communications system. Their paper 
looks at ways of matching individual messages sent to individual messages received, ir- 
respective of user, and determines a measure of the anonymity provided by the system. 
Here we extend this approach to include in the calculation information about how many 
(^T) ' messages were sent or received by a user and we define a new metric that can be corn- 

s' , puted exactly and efficiently using classical and elegant techniques from combinatorial 

enumeration. 



O ' 1 Introduction 

G\ ' 

Anonymity networks have evolved to address the problem of anonymous communication 
among users. As internet technology becomes more prevelant in everyday life, questions of 
^ . privacy and monitoring become more important. The anonymous network provides a means 

of communicating confidentially. However, it is still vulnerable to attack. One of the means 
of this is by attempting to match messages sent with messages received. An exhaustive brute 
force attack is inefficient; statistical attacks are reasonably fast and reasonably effective. 
Further, there is a need for a metric to measure the amount of anonymity that can be 
expected from a system. 

A mix network, invented by Chaum [2], is a mechanism for anonymizing the correlation 
between senders and receivers of messages. Messages are sent into the mix where they 
are gathered, permuted and then delivered. There are several mechanisms for doing this, 
including a threshold mix which takes in messages and holds them in a buffer until a 
predetermined threshold number of messages is reached and then it sends them. The only 
possible attacks are based on observation of the input/output behaviour of the mix. We 
assume that it is possible for an adversary to see how many messages a user has sent and 
how many a user has received. 

The main challenge for breaking anonymity in mix networks is determining whether 
some user Alice is communicating with some user Bob. A secondary consideration can 
be the trajectory of a particular message, either who sent it or who received it. Again 
focusing on Alice and Bob, we can either determine a metric which indicates the likelihood 



of correlating messages sent by Alice with messages received by Bob, or we can actually 
generate an attack which will attempt to break the system and reveal whom Alice is talking 
to (or who is talking to Bob). 

Metrics allow the user to make an informed choice of anonymity network. They also 
allow an evaluation of how good an anonymity model is. Historically, metrics have often 
considered either the perspective of an individual user or of an individual message. A recent 
paper of Edman et al. expands the view to a system wide approach, but focuses on the 
traffic of individual messages. Here we extend the approach to consider the traffic of sets 
of messages (in particular, messages sent by the same user, Alice, or received by the same 
user, Bob). 

Section[2]introduces our metric, reviews existing metrics and explores the differences ours 
manifests. Sections [3] and [J] show how to calculate the metric. Section [5] is a presentation 
and analysis of the data. Section [6] explores future work and delivers a conclusion. 

2 Our Metric 

To establish our notation, suppose there are k senders and I receivers, the i th sender sends 
S{ messages in a round and the j th receiver receives rj messages in a round, and that the 
total number of messages sent in a round is n, i.e. si + S2 + • • • + = r\ + T2 + • • • + r% = n. 
Then we want to know how many ways the n messages could be divided up in this way. 
These problems are often modeled, particularly in statistics literature, in terms of balls and 
urns. In this language we want to know how many ways there are to deposit n balls in I 
urns, where there are k different colours of balls: s\ of one colour, S2 of a second colour, 
etc., and each urn is to hold a particular number of balls: the first urn holds r\ balls, the 
second urn holds r<i balls, etc. 

For example, suppose there are three messages, each labeled by a, sent by A\, three 
messages, each labeled by (3, sent by A2, and two messages, each labeled by 7, sent by 
A3. Suppose B\ receives five messages and B2 receives three messages. Then by direct 
exhaustive, brute force enumeration of all the possibilities, there are nine different ways 
this could happen, where the first bracketing is the messages received by B\ and the second 
bracketing is the messages received by B2: (a, a, a, (3, (3)([3, 7, 7); (a, a, a, /?, 7) (/?,/?, 7); 
(a, a, a, 7, 7) (/?,/?,/?); (a, a, f3, f3, 0){a, 7, 7); (a, a, (3, (3, 7) (a, (3, 7); (a, f3, f3, f3, 7) (a, a, 7); 
(a,/3,/3,7,7)(a,a,/3); (f3, f3, f3, 7, 7) (a, a, a) (a, a, ^77) (a/3/3). 

In this type of system, the attacker can gain information by careful observation of the 
volume of messages originating or terminating at a user. Consider at one extreme when n 
messages are sent by Alice and n messages are received by Bob and no messages are sent or 
received by any other users. Then we know with certainty that the messages sent by Alice 
all went to Bob. At the other extreme we have n different senders each sending one message 
and n different receivers each receiving one message. In this case the number of possibilities 
for sender-receiver pairs is n\. But even at intermediate stages, when some messages are 
sent by Alice and some messages are sent by others, and some messages are received by 
Bob and some messages are received by others, we can count the number of ways this could 
happen. Counting this partitioning is actually a very old problem [11] and can be solved in 
terms of a variety of generating functions called symmetric functions, as we will discuss in 
Sections [3] and [H 

Our metric expresses the degree of anonymity as a ratio with the denominator repre- 
senting the system with the most anonymity. In this metric the most anonymity is provided 



by a system in which n messages are sent but each sender sends exactly one message and 
each receiver receives exactly one message, as discussed above. In this case there are n! 
possibilities to match up a sender message with a receiver message. We can informally 
define our metric as 

\og{COUNT) 

log(n!) ' 1 J 

where COUNT is the number of ways for k senders and I receivers to exchange n 
messages if sender i sends Si messages and receiver j receives r,- messages. We use the 
log here to have a compression of the scale for better representation, and to avoid having 
numbers that are too large. In Section[3]we will describe in detail how to calculate COUNT. 

This metric is simple and straightforward to understand and calculate. It is a system 
wide metric that measures the anonymity afforded by the system as a whole, rather than 
the anonymity afforded to a single user. 

We now review existing metrics. Note that various perspectives are possible, e.g., the 
anonymity of an individual user, the anonymity of an individual message, or the anonymity 
of the system as a whole. The anonymity metric seeks to distill into a single number the 
strength of the network with respect to protecting its users' anonymity. This number is 
referred to as the degree of anonymity and was first proposed by Reiter and Rubin |12j . 
Their degree of anonymity requires a probability p assigned to each potential sender and is 
defined as 1 — p for each user. A more systematic approach due to Berthold et al. [1J gives 
the degree of anonymity for a system of N users as A = log 2 (A r ). These metrics require 
estimates of properties of the system and can be imprecise. 

The next step in metric evolution is the information theoretic (or entropy based) metrics. 
Serjantov and Danezis [13] define a metric S = —J2u=iPu^og 2 (p u ) where n is the number 
of users, and p u is the probability that a user u was the sender or the receiver in a message 
exchange. This metric is called the effective anonymity set size and it measures the entropy 
of the system. Recall that information entropy, as defined by Shannon, reflects the average 
information gained over a sequence of symbols, each having some probability. In this case, 
we have the probability that a specific user u has sent a message. In the best case, this 
value will be equal to log 2 (n), and grows with n. In the worst case — a user never sends, 
or is the only one to send — it will be 0. 

An improvement on this approach is the normalized metric of Diaz et al. [5] , called the 
degree of anonymity, which is defined as 

deg = S = ~Eu=lPu log 2 (Pn) 
5max log 2 (n) 

where the term Smax is the maximum entropy of the system which is log 2 (n). This division 
normalizes the result of Serjantov and Danezis, restricting the range to [0, 1] independent 
of n. 

There are drawbacks to these formulas. In this last case, it can be argued that, because 
of the normalization, it becomes easier to compare results but, at the same time, the number 
of users appears to become irrelevant. It also does not consider the users who have sent 
messages vs. the set of possible users: these metrics reflect a snapshot of the use of the 
system. Such a snapshot is also necessary to evaluate the probabilities required to compute 
the formula, but it is difficult to determine the degree of confidence we can have in such 
estimates, i.e. their quality. 



A different approach is the combinatorial metric of Edman et al. [6 J . They first define 
a bipartite graph, G = (Vi, V2,E), where V\ is the set of sent messages and V2 is the set of 
received messages. There is an edge between two messages Sj and tj if the sent message 
could be the same as the received message tj. Then this graph has an adjacency matrix, 
A = (cLij) nX m where the rows are indexed by the sent messages V\ and the columns are 
indexed by the received messages Vi. A perfect matching in a graph is a subset of edges 
such that every vertex is adjacent to exactly one edge in the subset. In a bipartite graph 
this amounts to a pairing off of each vertex in set V\ with exactly one vertex in set V2. 

In a bipartite graph it is well-known how to count the number of perfect matchings in 
the graph: they are counted by a mathematical function, the permanent, defined 

n 

per (A) = X)n°*.^)' 

a i=l 

where the sum is over all permutations, a, of n and A = (ttjj)nxn is the adjacency matrix 
of the bipartite graph. The reason the permanent works is as follows: every permutation 
selects an entry from each row and each column, so every pair consisting of a vertex in 
Vi and a vertex in V2 is represented exactly once. If any of the selected entries is zero 
(i.e. there is no edge between those two vertices) then the product is zero and there is no 
perfect matching associated with that permutation. Conversely, if all entries are one then 
this permutation describes a perfect matching. 

Edman et al. define a combinatorial degree of anonymity as follows: 

( n = l 

deg = i log(per(A)) 1 
{ log(n!) 

As with the degree of anonymity of Diaz et al. the measure reflects a ratio of the actual 
measurement over the ideal case. The denominator is a reflection of the fact that the 
system providing the most anonymity is the one in which each sent message is potentially 
connected to each received message, i.e. the complete bipartite graph. Then the n x n 
adjacency matrix is the all l's matrix and the number of perfect matchings is equal to n!, 
the number of permutations of n. 

Edman et al. then generalize their definition to matrices with entries which are prob- 
abilities (doubly stochastic matrices). In this model the probability in position i,j is the 
probability that the edge between Sj and tj is in a perfect matching. Here, as in the un- 
weighted case, they take a product of entries. There are a number of concerns with this 
approach, some of which our approach corrects. 

While the permanent counts perfect matchings in the unweighted case, it is not clear 
which statistic is counted in the weighted case, since it is merely the sum of products of 
terms in the adjacency matrix. In the case of a 0-1 matrix the permanent terms are the 
products of zeros and ones. But the reason there is one term for each perfect matching 
is that this procedure is essentially a logical AND. That is, a single zero will make the 
product of the entire set zero. So while the entries are technically multiplied, they could 
just as easily be ANDed to the same effect. In generalizing to the non 0-1 case it is not 
clear why multiplication should be the operation of choice to combine elements, nor what 
it counts. 

Furthermore, this approach requires the calculation of probabilities for each edge and 
this in itself can be problematic. Are these probabilities estimated? (with all the inherent 



issues of inaccuracy). Are they calculated, say using an approach such as statistical disclo- 
sure? If so, what is the complexity of this approach and what are the quality of results it 
provides? 

Moreover, the perfect matching approach of Edman et al. considers a sent message, S{, 
being matched to a received message and counts Sj being matched to received message t u 
as different from Sj being matched to t v , even if t u and t v are received by the same user. 
Certainly they are different messages, but if the goal is to determine who is communicating 
with whom, the important part is to determine that one of the many messages Alice sent 
is one of the many messages Bob received. 

The recent work of Gierlichs et al. [8] refines the metric of Edman et al. to account for 
many messages sent and received by each user. To account for this Gierlichs et al. look 
at the equivalence class of perfect matchings. This is actually the same situation as what 
we have already discussed. For example, if user A\ sends 2 messages and user A 2 sends 
3 messages, while user B\ receives 2 messages, user B 2 receives 2 messages, and user B 3 
receives one message, and each A user could potentially communicate with each B user, 
then there are 5! perfect matchings possible to pair up the messages sent with the messages 
received. 

The authors denote the perfect matching by Mc and each equivalence class by [M p ] 
and cardinality |[M p ]| = C p . The total number of equivalence classes is some value 
(determined by the problem). In the example, then, there are 5 equivalence classes with 
cardinalities d = 12, C 2 = 48, C 3 = 24, C 4 = 12, C 5 = 24. 

[Mi] = [(A l ,B 1 ),(A l ,B 1 ),(A 2 ,B 2 ),(A 2 ,B 2 ), 
(A 2 ,B 3 )l 

[M 2 ] = [(A 1 ,B 1 ),(A 1 ,B 2 ),(A 2 ,B 1 ),(A 2 ,B 1 ), 

(A 2 ,B 2 ),(A 2 ,B 3 )}, 
[M 3 ] = [(A 1 ,B 1 ),(A 1 ,B 3 ),(A 2 ,B 1 ),(A 2 ,B 2 ), 

(A 2 ,B 2 )], 

[M 4 ] = [(A 1 ,B 2 ),(A 1 ,B 2 ),(A 2 ,B 1 ),(A 2 ,B 1 ), 
(A 2 ,B 3 % 

[M 5 ] = [(A 1 ,B 2 ),(A 1 ,B 3 ),(A 2 ,B 1 ),(A 2 ,B 1 ), 
(A 2l Bi),(A 2 ,B 2 )] 



where we have used the notation [Ai, Bj) to mean a pair consisting of an element from A^ 
and an element from Bj, so, for example, (A 2 , Bi) is the set of all pairs (a 2 , b\), (a 2 , b\ ), (a 2 , b\), 
(o|, bf), (of, b\), (a|, b\) where a\ means the ith message sent by user A 2 . 

The authors define the system's anonymity level, d*(A), as 

- Eg=i PrjMc € [Mp]) • log(Pr(M c € [M p ])) 
log(n!) 

if n > 1, and as if n = 1, where Pr(M c G [M p ]) = p-§fjj. 

It is worth noting that Franz et al. [7] also take a counting approach, although they 
do not do the full generality of sender /receiver patterns. In one instance they look at all 



possible combinations of senders for a given set of messages. In another instance they count 
senders sending various combinations of messages but do not consider receivers receiving 
several messages. While some ideas are similar to ours and could be expanded further 
using enumerative techniques, they do not have the full generality of our approach. In fact, 
classical enumerative methods used by Franz et al. and Gierlichs et al. indeed work to 
count when either the senders or the receivers are fixed at sending or receiving one message 
each; to do the full generality of both receivers and senders together dealing in multiple 
messages one needs symmetric functions, as discussed below. 

Like Gierlichs et al. we suppose that senders and receivers send many messages, we have 
a combinatorial metric like Edman et al. and ask "how many ways for these senders to send 
to these receivers?". This differs from Gierlichs et al. who use an entropy based metric, 
asking instead, "what's the probability that this perfect matching is right one?" Their 
approach requires the calculation of two parameters: equivalence classes and cardinality 
(see Appendix [B] for a discussion of a way of calculating cardinality). To do this they 
provide a divide-and-conquer algorithm that they note becomes rather expensive for large 
n. Indeed in their conclusions they suggest that a more efficient algorithm remains an open 
problem. Our method essentially calculates the size of the equivalence classes; however, 
it does so without explicitly enumerating them, thus the approach is extremely fast and 
streamlined. It provides a rapid but accurate measure of the anonymity of the system. As 
we discuss in Section [5l numerous trends can be discerned and this provides an interesting 
focus for future work. 

3 Calculating our Metric 

We now turn our attention to determining COUNT as defined in Section [2j The calculation 
of it is straightforward; however, it requires some "heavy machinery" from combinatorial 
enumeration, namely generating functions and symmetric functions (which are a special 
type of generating function). We briefly review generating functions before discussing the 
appropriate one for this particular problem. Excellent introductions to generating functions 
can be found in [9], [H] or [T7] . 

A generating function is a sum of powers of x where the coefficient of x 1 counts how 
many items of size i there are. In a sense the powers of x are merely placeholders, with the 
i th power holding the place for items of size i, and the a;'s are not expected to be evaluated. 
For example, if there are four ways of having two messages delivered, three ways of having 
one message delivered, and one way of having no messages delivered, then the generating 
function is 1 + 3x + 4x 2 . For a counting problem a generating function is set up that models 
the problem and then the required coefficient is extracted. The notation [x l ] means "the 
coefficient of x\" Thus in the example, [x 2 ](l + 3x + Ax 2 ) will give us the value 4. 

Generating functions have the advantage that they encode all the enumerative infor- 
mation and they can easily be manipulated, e.g. multiplied together. The extraction of a 
coefficient can prove to be a challenge sometimes if a direct formula for it is not easily ob- 
tainable. In this case a symbolic computation program such as Maple can be an important 
tool. 

Turning to our specific problem, the generating function allows us, given the number of 
messages sent and received by various users, to determine exactly the number of ways this 
could take place. If there are a lot of ways for this to take place then the system does not 
leak much information and remains relatively anonymous. If there are only a few ways for 



this to take place then the system is leaking a lot of information. 

We define our degree of anonymity precisely as follows. As mentioned above, we take a 
ratio with the denominator representing the system with the most anonymity, i.e a system 
in which each user sends or receives a single message. In this system our approach is no 
better than counting perfect matchings and there are n! possibilities. The numerator is the 
number of ways the n messages could be divided up. This is the coefficient of X-i • • • ^ fa 
in the generating function, GF, for the number of ways of n messages being received such 
that the ith user receives messages, for 1 < i < t. Thus the degree is 

= logq^g^agF) 

log(n!) 

Now we consider the form of the generating function, GF, for this problem. This 
generating function is a special type of function called a symmetric function. First, a number 
of further definitions are required. A symmetric function f(x) in variables xi, X2, ... is a 
function such that a permutation of the variables does not change the value of the function, 
i.e. /(s ff ( 1 ), x CT ( 2 )> • • • ' x a(k) ) = f(, x i> x 2, ■ ■ ■ ,%k)- Then the homogeneous symmetric function 
of degree m, h m (x), is the sum of all homogeneous terms in x = x\, X2, ■ ■ ■ i.e. 

hi(x 1 ,x 2 ,x 3 ) = x 1 +x 2 + x 3 

h 2 (xi,X2,X 3 ) = x\ + X2 + x\ + X\X2 
+ X1X3 + X 2 X 3 

h 3 {x\,X2,x 3 ) = x\ + X2 + x 3 + x\ x 2 + x\x 3 

+ X\x\ + x\x 3 + Xix| + X2x| 
+ X1X2X3 



The homogeneous symmetric functions can also be defined for infinite sets of variables. 
Furthermore, for A = Ai, A2, • • • , A m a partition of n, where Ai + A2 + . . . + A m = n (i.e. a 
nondecreasing sequence of nonnegative integers that sum to n), then h\ is defined as the 
product h Xl hx 2 . . . h Xm . 

Theorem 1 Given that S{ messages are sent by each sender, 1 < i < k, and that Ti 
messages are received by each receiver, 1 < i < t, in a round, the number of ways this could 
happen is 

[x^x^ 2 ■ . . x s k k ]h ri (x.)h r2 (x) . . . h n (x) 
where x is x\,x%, ... ,Xk and h q (x) is the homogeneous symmetric function of degree q. 

Proof: The term h m {x\, X2 ■ ■ ■ , x&) counts the number of different ways m elements could 
be received where the elements are drawn from 1,2, ... ,k (e.g. if there were three elements 
received and two possible kinds of elements, then this is h 3 (xi, X2) = x\ + x| + xfx2 + xix|). 

The product h ri (x)h r2 (x) . . . h n (x) is, by the product lemma in enumerative combina- 
torics [9|[pp36-37], the generating function for the number of ways of one user receiving r\ 
elements, a second user receiving r2 elements, etc, simultaneously. 

The term [x**] denotes the coefficient of xp in the expression. This counts the number 
of ways Si i's could be sent. 

The entire expression in the statement of the theorem thus counts in general the number 
of ways Si i's could be sent and i's could be received. 



QED 

Thus the generating function we require is a symmetric function and we can define our 
degree of anonymity to be 



lpgQxfx^ 2 • • • gfc fc ]^ri(x)fera(x) ■ ■ ■ M X )) f9 \ 

de 9A = : — -r-r, • (3) 

log(ra!) 

where there are k senders and I receivers, the i th sender sends Sj messages in a round and 
the j th receiver receives r,- messages in a round, and that the total number of messages sent 
in a round is n, i.e. s\ + S2 + . ■ ■ + Sk = r\ + r2 + . . . + re = n. 

Consider an example. Recall that earlier we showed the example of an input/output 
round of three messages sent by Ai, three messages sent by A2, two messages sent by A3, 
five messages received by B\ , and three messages received by B2 ■ In our generating function 
terms this means that we need two complete generating functions: h§ for B\ and /13 for 
B2 (since B\ receives five messages and B2 receives three messages). Since there are three 
users sending messages, the number of variables for each generating function is limited to 
three. Since we know that A\ sends three messages, A2 sends three messages, and A3 sends 
two messages, we require the coefficient of Specifically, 



h>5 (xi ,X2,x 3 ) = x\ + x\ + X3 + x\x 2 + xjx 3 

+X 2 Xl + X2X3 + X3X1 + X3X2 

_i_t»3™2 1 3^2 1 3^2 1 3™2 
^^jj-^<t>2 1^ 13 ' 2 1 2 3 

+x\x\ + x\x\ + x\ x 2 x 3 

+X2X1X3 + X3X2X1 + x\x\x3 

,22 ,22 
+X X X 3 X2 + X 2 X 3 Xi 



O Q q O Q 

^3(^1) ^2,^3j = X l + X 2 + X 3 + X X X2 + X ± X 3 
-Yx\x\ + X2X3 + X3X1 + x|x2 

+X1X2X3 

We can multiply these two generating functions together and collect terms (admit- 
tedly a slow process, but we will improve on it in Section U]). This approach shows 
that hs{xi, X2, xz)h%(x\, X2, X3) has nine terms of the form xfx^x^ formed from the follow- 
ing products: (xfx^)(x 2 x§); (xf x 2 x 3 )(x|x 3 ); (x^x§)(x|); (x|xf )(xix|); (xfxix 3 )(xix 2 x 3 ); 
(xfxix 3 )(x?x 3 ); (xix|x|)(xfx 2 ); (x|x|)(xf) (xf x 2 x|)(xix|); . 

Thus [x\ X2x\]h^{xi, X2, Xz)hz{x\, X2, X3) = 9. Note that these can be matched exactly with 
the a, (3, 7 terms obtained earlier in the section through direct enumeration of the various 
possibilities. 

Also compare this approach with the permanent-based approach that considers all pos- 
sible combinations of messages sent and received. Since there are eight messages involved, 
there are 8! = 40320 ways to send them, a substantially larger number of possibilities. 
Our degree of anonymity is log 9/ log 40320 = 0.954/4.605 = 0.207 whereas the degree of 
anonymity of Edman et al. is 1. 



4 Extracting the Coefficient 



Recall from Section [3] that the generating function for the problem is the homogeneous 
symmetric function h\ and that in order to evaluate our degree of anonymity, degA, in 
equation ([3]) we need to extract the coefficient. This section explores the theoretical basis 
for this extraction and explains the calculation that needs to be made. 

Symmetric functions form a graded ring. The most natural basis for this ring is the set 
of monomial symmetric functions. The monomial symmetric functions, m\, are defined as 
ttt-a(x) = J2 a x ° where the sum ranges over all distinct permutations a = (ai, ct2, ■ ■ ■ , ot n ) 
of the entries of the partition A = (Ai, A 2 , . . . , A n ). For example, rn^i^xi, X2, x%, £4) = 

X1X2X3 + X1X2X4 + X1X3X4 + X2X1X3 + x\x\X^ + £23:3X4 + x\x\X2 + X3X1X4 + X3X2X4 + 

x\x\X2 + X4X1X3 + X4X223. As a basis, then, we can write any symmetric function, /(x), as 
/a to a( x ) where the sum is over all partitions of n. 

There is also a very natural scalar product defined on this ring. It is defined such that 
< roA(x),/i (1 (x) >= where equals 1 if A = [i and otherwise. Note in particular 
that, although the monomial symmetric functions, and indeed the homogeneous symmetric 
functions, are both bases for the ring of symmetric functions, neither is an orthonormal basis 
with this scalar product. This scalar product however allows us to extract the coefficients 
of a symmetric function /(x). Suppose we want [ 1. Then 

< f{x),h fM (x)> = <^/ A m A (x),^(x) > (4) 

A 

= 5^/a < m A (x),/i /1 (x) > (5) 

A 

= U- (6) 

Thus to compute the number of ways k senders could send s%, S2, ■ ■ ■ , s& messages and I 
receivers could receive r±,r2, ■ ■ ■ ,re messages, we calculate the scalar product < h si * h S2 * 
...*h Sh ,h n * h r2 * . . . * h n >. 

The step- by-step justification for the procedure to calculate the number of ways k senders 
and t receivers send n messages in a round such that the i th sender sends si messages and 
the j th receiver receives rj messages, is as follows: 

1. By Theorem[T]this number can be represented by [x^x^ 2 ■ • • x s k k ]h ri (x)h r2 (x) . . . h re (x). 

2. By equation ([6]) the coefficient for equal to the scalar product of 
the generating function representing senders and the generating function representing 
receivers. 

3. By the proof of Theorem [T] and the comments before Theorem [U the complete sym- 
metric function, h s , is the generating function for a sender sending s messages, and 
h Sl h S2 . . . h Sk is the generating function for k senders with the i th sender sending Sj 
messages. Similarly for receivers. 

Now that we can extract the coefficient via the scalar product, we can calculate our 
degree of anonymity, degA- However, with the exception of a few special cases, obtaining a 
closed form expression for the scalar product is difficult. The alternative is to use a symbolic 
computation package, such as Maple, to calculate the scalar product. In the next section 
we outline the results we obtained using such a program. The computations presented here 
were carried out using Maple 8 and the symmetric functions package, SF, written by John 
Stembridge [15] . All of the calculations mentioned ran in a few seconds or less. 



5 Data Analysis 



We have already discussed the extremes of the metric (i.e. when Alice sends all messages, or 
when each user sends exactly one) and have discussed how it discerns between cases better 
than the metric of Edman et al. In this section we conduct a number of experiments on the 
metric and discover a number of patterns and trends. In particular we explore what the 
metric looks like, some interesting features of it, and answers to some interesting questions. 

The two figures, Figure 1 and Figure 2, illustrate the behaviour of the metric from 
more favourable (anonymous) to less favourable situations, based on two different scenarios 
where Alice's communications become predominant. The first considers the case where the 
number of messages sent by Alice increases but the number of messages stays the same. 
The second considers the case where the number of messages sent by Alice increases and the 
total number of messages also increases. These both show that the metric tends to linear 
as the values increase away from perfect anonymity. 

Evolution of Metric with Proportion of Messages Sent 
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Figure 1: For a total 15 messages sent to distinct receivers, evolution as Alice sends growing 
numbers of these messages. 

Evolution of Metric with Increase in Messages by Alice Only 
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Figure 2: For seven distinct senders, including Alice, and one receiver per message, evolution 
as Alice sends from one to 13 messages and all others send one. 

Now we turn to features. This metric has many interesting features, as illustrated by 
the following example for n = 7. We have calculated the coefficients for all sender-receiver 
combinations for n = 7 messages and the results are shown in a table in the Appendix. 



Note that in the table we have taken advantage of symmetry, i.e. sx, S2 • • • , Sfe senders and 
ri, r2, . . . ,rg receivers gives the same coefficient as r±, r2, . . . ,?"fe senders and sx, S2, • • • > Sk 
receivers. We have worked with n = 7 because the number of cases to consider is tractable 
and the coefficients obtained are small enough to be meaningful; however, the calculations 
also run in seconds on larger values of n. The table shows clearly that, as one would expect, 
a lot of senders sending a few messages each (or a lot of receivers, receiving few messages 
each) results in the most anonymity. A closer examination of the table reveals a number of 
interesting facts. 

First, coalescing a sender of a single message into another sender (e.g. going from 
1,1,1,1,1,1,1 to 1,1,1,1,1,2) cuts the coefficient by a factor proportional to the new 
sender's number of messages, a dramatic reduction in the anonymity. More precisely, going 
from l,k to k + 1 divides the coefficient by k + 1. The same applies to receivers. The 
coefficient is cut by a smaller amount because the log smooths the function. 

Second, there are counterintuitive instances as well where more senders (resp. receivers) 
and fewer messages is not superior. For example, 1, 1, 1, 1, 1, 1, 1; 1, 2, 2, 2 has a coefficient 
of 630 and degree 0.756 and 1, 1, 1, 1, 1, 2; 1, 1, 1, 2, 2 has a coefficient of 690 and coefficient 
0.767, yet the first sender-receiver pair has more senders. Of course the second has more 
receivers. A further interesting case is 1,3, 3; 2, 2, 3 with coefficient 19 and degree 0.345, and 
1, 3, 3; 1, 1, 1, 4 with coefficient 20 and degree 0.351, and thus a barely discernable difference 
between the coefficients and degree, yet the second group has more receivers. It can be 
difficult to strike a hard and fast rule about which is better. 

Third, it is actually relatively easy to rank the partitions from most anonymous to least. 



Messages 


Coeff 


Deg 


1,1,1,1,1,1,1 


5040 


1 


1,1,1,1,1,2 


2520 


0.919 


1,1,1,2,2 


1260 


0.832 


1,1,1,1,3 


840 


0.790 


1,2,2,2 


630 


0.756 


1,1,2,3 


420 


0.708 


1,1,1,4 


210 


0.627 


2,2,3 


210 


0.627 


1,3,3 


140 


0.580 


1,2,4 


105 


0.546 


1,1,5 


42 


0.438 


3,4 


35 


0.417 


2,5 


21 


0.357 


1,6 


7 


0.228 


7 


1 






In comparing the 15 possible sender partitions with a receiver set 1,1,1,1,1,1,1, we 
can order, from most anonymous, to least anonymous as shown in the table above. This 
trend holds up with other receiver sets as well, showing that, in general, more senders is 
better, although, as noted in the second point, when the receiver set also varies, there is 
some variation from this rule. 

Finally, we can also represent the n = 7 information graphically in Figure [3j This plots 
the different coefficients we may have and shows that, except near the extreme values, the 
changes in the degree tend to be linear in terms of the variations in the coefficients. 



Unique Anonymity Measures for all Variants of 7 to 7 Communications 
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Figure 3: Ordered Values of the Metric Generated from All Permutations of Seven 

Another interesting feature for any n occurs when the number of messages sent by one 
party is equal to or greater than the total messages sent by all other parties. In this case the 
coefficient stops increasing, although, of course, the degree decreases. For example (note 
we used the notation l 6 to mean 1, 1, 1, 1, 1, 1), 
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Coeff 


Deg 


1, l y 


1, l 6 


5040 


1 


2, l 6 


2, I s 


10,440 


0.873 


3, l 6 


3, I s 


12,840 


0.739 


4, l 6 


4, I s 


13,290 


0.629 


5, l 6 


5, I s 


13,326 


0.543 


6, I s 


6, 1 B 


13,327 


0.475 


7, I s 


7, l b 


13,327 


0.421 


8, I s 


8, 1 B 


13,327 


0.377 


9, I s 


9, 1 B 


13,327 


0.340 


10, l 5 ; 10, l e 


13,327 


0.310 



In terms of answering an interesting question, we can derive some information on the 
importance of the size of the mix on the metric. As any degree metric is a ratio, it does 
not explicitly take into account the number of users in the system. It is then a legitimate 
question to see how, for a given sender/receiver pattern, the value evolves as the size of the 
mix increases. Figure |4] presents such an experiment, where the ratio of messages sent from 
Alice to Bob remains the same as the number of users increases. More precisely, the figure 
shows the effect on degree of anonymity as the number of messages increases in the case 
when the ratio of Alice's messages sent (and Bob's messages received) to the total number of 
messages sent stays the same. In each instance k messages are sent: Alice sends p messages, 
and the k — p other users each send one message, while Bob receives p messages and the 
other k — p users each receive one message (for p = 1, 2, 3, 4, 5, 6, 7, 8, 9 and k = 9 p + 1). 



6 Conclusion and Future Work 

We have developed an elegant and easy way to calculate a metric for the degree of anonymity 
of an anonymous communication system. Our metric uses techniques from classical enumer- 
ation to count without actually calculating the various possibilities of senders and receivers 
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Figure 4: Scalability of the metric 



exchanging different combinations of messages. It nicely and naturally extends the metric 
of Edman et al. [6j who introduced the combinatorial approach to this area. Our metric 
is straightforward to calculate employing Maple and using it we are able to produce data 
that highlight a number of significant trends such as we produced in Section 

Future work will focus on practical uses of the metric. One possible direction is to turn 
our metric into an attack. At the moment it is a measure of the degree of anonymity a user 
could expect, but it may be possible to further exploit the knowledge of the partition sizes 
to break the anonymity network. 

The "mostly linear" behaviour we have mentioned in Section [5] deserves further study. 
While we feel linearity is an interesting property for a metric to have, Figure [3] shows that 
it does not appear in the extreme cases of full or zero anonymity. Further investigation is 
required to explore the limits of this approximation with large sizes, and how it can be best 
exploited practically. 

A further useful practical outcome would be a recommendation to Alice on what she 
should do at each stage. For example, would sending another message to Bob increase the 
likelihood of detection or not? Is she advised to send a message to someone else? Should she 
wait until the next round? It may be possible to provide guidance to her using knowledge 
of the partition sizes. 

Finally, future work could include more analysis of the data: this metric is a fast, simple 
tool for calculating the anonymity of the system; as we showed in Section [5] it allows a 
number of interesting features to be detected. More analysis will yield more patterns. 
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A Example 



This is the table of all possible sender-receiver pairs for n = 7 messages. The table includes 
the symmetric function coefficient as well as the degree of anonymity. Note that the notation 
l 7 means 1, 1, 1, 1, 1, 1, 1. 
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B Cardinality of equivalence classes 



We can outline a procedure for calculating the cardinality of the equivalence classes as 
required for the metric of Gierlichs et al. 

We know the size of both the sender sets and the receiver sets. Suppose there are p 
sender sets and q receiver sets. Each receiver set will consist of elements from the sender 
sets in some combination. 

1. First determine all integer partitions of each of the receiver sets. This can be done, for 
example, with the algorithm from [18] . The integer partitions will count the number of 
possible ways the sender sets can be partitioned in the receiver set. For example, if the 
receiver set has 3 elements then the partitions are: 1, 1, 1; 2, 1; 3. We could have one 
element from each sender set, two elements from one of the sender sets and another 
element from another sender set, or all elements from a single sender set. However, 
the problem here is that determining this properly is probably equivalent to explicitly 
determining the equivalence classes to begin with and this laborious procedure is 
something we avoid through the power of our symmetric function method. We want 
to find all sequences tu, t2i, ■ ■ ■ , £&j such that J2a tai = r i- But we want for the same 
set of t's to find all sequences tj±,tj2, ■ ■ ■ ,tjt such that J2t^jb = s j- This could be 
done either by exhaustive search or through an integer programming approach. 

2. Since we have determined how many ways each sender set could be partitioned so that 
tji elements go to receiver set i for 1 < i < k, we can construct a series of multinomial 
coefficients of the form -. — , , . 

tji\tj2'—-tji\ 

3. Now the elements within each receiver set of size rj can be arranged in rj! ways. 

The cardinality of receiver set i is Ylj t ^.t^l t- e \ ri " 

The cardinality of the equivalence class is ]lj Hj t ^.t^i t- l \ r ^ m • 



